-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix model-downloader and tgi in multi shard case #642
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
Cache being in emptyDir
i.e. going away when the pod instance goes away (instead of being shared like model data), is more secure, but it can be significant pod startup performance issue. Especially with HPA and in other setups were pods come and go. That can be looked in another PR though.
I think in long term it would be better to separate model downloading and running of the application services. That would allow model downloading to be centralized to single service / container, instead of split over multiple pods, and even to do it as separate step before starting any of the application pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Signed-off-by: Lianhao Lu <[email protected]>
Fix issue opea-project#639 Signed-off-by: Lianhao Lu <[email protected]>
Description
Upgrade huggingface-hub to version 0.26.5 when downloading models, due to the existing
huggingface/downloader:0.17.3
image doesn't acknowledge the HF_TOKEN correctly.Loose the tgi securityContext to allow running with multi shard.
Issues
Fixes #641
Fixes #639
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
List the newly introduced 3rd party dependency if exists.
Tests
Describe the tests that you ran to verify your changes.